Research on Cassandra Data Compaction Strategies for Time-Series Data
نویسندگان
چکیده
Storage and analysis of time-series data is a subject of intense interest in the current international database research field. Time series data, a sequence of collected data information points by fixing time interval, is an important basis to proceed business analysis and prediction in the future. As an excellent NoSQL database, Cassandra is often used to storage time-series data because of its characteristics of data model. In the scene of real application, time-series data used to proceed the management of data life cycle by setting up TTL; the real delete operation would not be executed immediately, while unnecessary data will be deleted during the compaction course. This paper focuses on the issue of the effect of different strategies for time-series data storage and the research on three Cassandra storage strategies: Size-Tiered Compaction Strategy, Leveled Compaction Strategy and Date-Tiered Compaction Strategy; and comparative test based on stable data storage, recording speed sorted string tables file numbers and so on. Finally, the compaction strategies suitable for time-series data application scenarios are obtained by carrying on experiments.
منابع مشابه
Bigtable Merge Compaction
We initiate the formal study of the online stack-compaction policies used by big-data NoSQL databases such as Google Bigtable, Hadoop HBase, and Apache Cassandra. We propose a deterministic policy, show that it is optimally competitive, benchmark it against Bigtable’s default policy, and suggest five interesting open problems.
متن کاملK-Slot SSTable Stack Compaction
We initiate the formal study of the online stack-compaction policies used by big-data NoSQL databases such as Google Bigtable, Hadoop HBase, and Apache Cassandra. We propose a deterministic policy, show that it is optimally competitive, benchmark it against Bigtable’s default policy, and suggest five interesting open problems.
متن کاملLightweight Indexing for Log-Structured Key-Value Stores
The recent shift towards write-intensive workload on big data (e.g., financial trading, social user-generated data streams) has pushed the proliferation of log-structured key-value stores, represented by Google’s BigTable [1], Apache HBase [2] and Cassandra [3]. While providing key-based data access with a Put/Get interface, these key-value stores do not support valuebased access methods, which...
متن کاملOn the Detection of Trends in Time Series of Functional Data
A sequence of functions (curves) collected over time is called a functional time series. Functional time series analysis is one of the popular research areas in which statistics from such data are frequently observed. The main purpose of the functional time series is to predict and describe random mechanisms that resulted in generating the data. To do so, it is needed to decompose functional ti...
متن کاملFitting of Count Time Series Models on the Number of Patients Referred to Addiction Treatment Centers in Semnan County
Abstract. Count data over time are observed in many application areas. Many researchers use time series patterns to analyze this data. In this paper, the poisson count time series linear models and negative binomials on this type of data with the explanatory variables are studied. The Likelihood analysis and the evaluation of count time series model based on generalized linear models are pres...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JCP
دوره 11 شماره
صفحات -
تاریخ انتشار 2016